Search CORE

74 research outputs found

Sample Efficient Monte Carlo Tree Search for Robotics

Author: Dam Tuan
Publication venue
Publication date: 01/01/2023
Field of study

Artificial intelligent agents that behave like humans have become a defining theme and one of the main goals driving the rapid development of deep learning, particularly reinforcement learning (RL), in recent years. Monte-Carlo Tree Search (MCTS) is a class of methods for solving complex decision-making problems through the synergy of Monte-Carlo planning and Reinforcement Learning (RL). MCTS has yielded impressive results in Go (AlphaGo), Chess(AlphaZero), or video games, and it has been further exploited successfully in motion planning, autonomous car driving, and autonomous robotic assembly tasks. Many of the MCTS successes rely on coupling MCTS with neural networks trained using RL methods such as Deep Q-Learning, to speed up the learning of large-scale problems. Despite achieving state-of-the-art performance, the highly combinatorial nature of the problems commonly addressed by MCTS requires the use of efficient exploration-exploitation strategies for navigating the planning tree and quickly convergent value backup methods. Furthermore, large-scale problems such as Go and Chess games require the need for a sample efficient method to build an effective planning tree, which is crucial in on-the-fly decision making. These acute problems are particularly evident, especially in recent advances that combine MCTS with deep neural networks for function approximation. In addition, despite the recent success of applying MCTS to solve various autonomous robotics tasks, most of the scenarios, however, are partially observable and require an advanced planning method in complex, unstructured environments. This thesis aims to tackle the following question: How can robots plan efficiency under highly stochastic dynamic and partial observability? The following paragraphs will try to answer the question: First, we propose a novel backup strategy that uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power Mean Upper Confidence bound Tree (Power-UCT). We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known Markov decision process (MDP) and partially observable Markov decision process (POMDP) benchmarks, showing significant improvement in terms of sample efficiency and convergence speed w.r.t. state-of-the-art algorithms. Second, we investigate an efficient exploration-exploitation planning strategy by providing a comprehensive theoretical convex regularization framework in MCTS. We derive the first regret analysis of regularized MCTS, showing that it guarantees an exponential convergence rate. Subsequently, we exploit our theoretical framework to introduce novel regularized backup operators for MCTS based on the relative entropy of the policy update and, more importantly, on the Tsallis entropy of the policy, for which we prove superior theoretical guarantees. Afterward, we empirically verify the consequence of our theoretical results on a toy problem. Eventually, we show how our framework can easily be incorporated in AlphaGo, and we empirically show the superiority of convex regularization, w.r.t. representative baselines, on well-known RL problems across several Atari games. Next, we take a further step to draw the connection between the two methods, Power-UCT and the convex regularization in MCTS, providing a rigorous theoretical study on the effectiveness of α-divergence in online Monte-Carlo planning. We show how the two methods can be related by using α-divergence. We additionally provide an in-depth study on the range of α parameter that helps to trade-off between exploration-exploitation in MCTS, hence showing how α-divergence can achieve state-of-the-art results in complex tasks. Finally, we investigate a novel algorithmic formulation of the popular MCTS algorithm for robot path planning. Notably, we study Monte-Carlo Path Planning (MCPP) by analyzing and proving, on the one part, its exponential convergence rate to the optimal path in fully observable MDPs, and on the other part, its probabilistic completeness for finding feasible paths in POMDPs (proof sketch) assuming limited distance observability. Our algorithmic contribution allows us to employ recently proposed variants of MCTS with different exploration strategies for robot path planning. Our experimental evaluations in simulated 2D and 3D environments with a 7 degrees of freedom (DOF) manipulator and in a real-world robot path planning task demonstrate the superiority of MCPP in POMDP tasks. In summary, this thesis proposes and analyses novel value backup operators and policy selection strategies both in terms of theoretical and experimental perspectives to help cope with sample efficiency and exploration-exploitation trade-off problems in MCTS and bring these advanced methods to robot path planning, showing the superiority in POMDPs w.r.t the state-of-the-art methods

TUbiblio

tuprints

Monte-Carlo tree search with uncertainty propagation via optimal transport

Author: D'Eramo Carlo
Dam Tuan
Maillard Odalric-Ambrym
Pajarinen Joni
Schneider Lukas
Stenger Pascal
Publication venue
Publication date: 19/09/2023
Field of study

This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) designed for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions. We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of

L^1

-Wasserstein barycenter with

\alpha

-divergence, by drawing a notable connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence to the optimal policy, and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines

arXiv.org e-Print Archive

Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering

Author: Dam Nhan
Dernoncourt Franck
Lai Tuan
Le Nham
Le Trung
Phung Dinh
Tran Quan
Publication venue
Publication date: 01/01/2020
Field of study

Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications.Comment: Accepted to COLING 202

arXiv.org e-Print Archive

Crossref

Monash University Research Portal

Study on chemical constituents of the lichen Parmotrema sancti-angelii (Lynge) Hale. (Parmeliaceae)

Author: Dam Nguyen Phuc
Hong Thanh Nguyen Pham
Ngoc Van Nguyen Thi
Thu Tram Nguyen Thi
Tuan Nguyen Trong
Publication venue: 'Publishing House for Science and Technology, Vietnam Academy of Science and Technology'
Publication date: 25/08/2016
Field of study

Lichens are fungal and algal/cyanobacterial symbioses resulting in the production of specific metabolites. Parmotrema sancti-angelii (Lynge) Hale is a lichen which has not been chemically and biologically studied well. From the lichen collected in Vietnam, colour reactions for identification of lichen substances (+K red, +P yellow, -C, +KC red ) suggested the presence of quinones, depsides and xanthones containing two free hydroxyl groups in meta-position, depsides and depsidones containing an aldehyde group. Chemical constituent study led to the isolation of three compounds, including methyl β-orcinolcarboxylate (1), salazinic acid (2) and atranorin (3). Their structures were confirmed unambiguously by X-ray diffraction, spectroscopic data and compared with those in references. This is the first report of salazinic acid distribution in such lichen. Keywords. Parmeliaceae, Parmotrema sancti-angelii, X-ray, NMR, salazinic acid

Vietnam Academy of Science and Technology: Journals Online

Time to Osteoporosis and Major Fracture in Older Men

Author: Barrett-Connor Elizabeth
Cauley Jane A.
Cawthon Peggy M.
Dam Thuy-Tien
Diem Susan J.
Ensrud Kristine E.
Filteau Guillaume
Fine Jason P.
Fink Howard A.
Gourlay Margaret L.
Kado Deborah M.
Lane Nancy E.
Lapidus Jodi A.
Nguyen Tuan V.
Nielson Carrie M.
Orwoll Eric S.
Overman Robert A.
Schousboe John T.
Szulc Pawel
Taylor Brent C.
Wilt Timothy J.
Publication venue
Publication date: 01/01/2016
Field of study

For older men who undergo bone mineral density (BMD) testing, the optimal osteoporosis screening schedule is unknown. Time-to-disease estimates are necessary to inform screening intervals

Carolina Digital Repository

Time to Osteoporosis and Major Fracture in Older Men

Author: Barrett-Connor Elizabeth
Cauley Jane A.
Cawthon Peggy M.
Dam Thuy-Tien
Diem Susan J.
Ensrud Kristine E.
Filteau Guillaume
Fine Jason P.
Fink Howard A.
Gourlay Margaret L.
Kado Deborah M.
Lane Nancy E.
Lapidus Jodi A.
Nguyen Tuan V.
Nielson Carrie M.
Orwoll Eric S.
Overman Robert A.
Schousboe John T.
Szulc Pawel
Taylor Brent C.
Wilt Timothy J.
Publication venue
Publication date: 01/01/2016
Field of study

For older men who undergo bone mineral density (BMD) testing, the optimal osteoporosis screening schedule is unknown. Time-to-disease estimates are necessary to inform screening intervals

Carolina Digital Repository

A Bibliometric Analysis of the Global Research Trend in Child Maltreatment

Author: Arya Subash Shalini
Bhattarai Suraj
Dam Van Nhue
Dhimal Meghnath
Do Ngoc Ha
Do Thi Hoa
Ha Hai Giang
Ho C.M. Roger
Ho Manh Tung
Ho S.H. Cyrus
Hoang Quan Vuong
Latkin A. Carl
Le Thi Huong
Moir Mackenzie
Ngo Toan Anh
Nguyen Hoang Long
Nguyen Huu Tu
Nguyen Le Tuan Anh
Nguyen Quang Hung
Nguyen Vu
Pham Van Thang
Shimpuku Yoko
Tran Thanh Tung
Tran Xuan Bach
Truong Thi Nu
Vu Minh Thi Thuc
Vuong Trang Thu
Publication venue: 'MDPI AG'
Publication date: 01/07/2018
Field of study

Child maltreatment remains a major health threat globally that requires the understanding of socioeconomic and cultural contexts to craft effective interventions. However, little is known about research agendas globally and the development of knowledge-producing networks in this field of study. This study aims to explore the bibliometric overview on child maltreatment publications to understand their growth from 1916 to 2018. Data from the Web of Science Core Collection were collected in May 2018. Only research articles and reviews written in the English language were included, with no restrictions by publication date. We analyzed publication years, number of papers, journals, authors, keywords and countries, and presented the countries collaboration and co-occurrence keywords analysis. From 1916 to 2018, 47, 090 papers (53.0% in 2010–2018) were published in 9442 journals. Child Abuse & Neglect (2576 papers; 5.5%); Children and Youth Services Review (1130 papers; 2.4%) and Pediatrics (793 papers, 1.7%) published the most papers. The most common research areas were Psychology (16, 049 papers, 34.1%), Family Studies (8225 papers, 17.5%), and Social Work (7367 papers, 15.6%). Among 192 countries with research publications, the most prolific countries were the United States (26, 367 papers), England (4676 papers), Canada (3282 papers) and Australia (2664 papers). We identified 17 authors who had more than 60 scientific items. The most cited papers (with at least 600 citations) were published in 29 journals, headed by the Journal of the American Medical Association (JAMA) (7 papers) and the Lancet (5 papers). This overview of global research in child maltreatment indicated an increasing trend in this topic, with the world’s leading centers located in the Western countries led by the United States. We called for interdisciplinary research approaches to evaluating and intervening on child maltreatment, with a focus on low-middle income countries (LMICs) settings and specific contexts

Kyoto University Research Information Repository